UniNE at CLEF 2015 Author Profiling: Notebook for PAN at CLEF 2015
نویسنده
چکیده
This paper describes and evaluates an effective author profiling model called SPATIUM-L1. The suggested strategy can be adapted without any problem to different languages (such as Dutch, English, Italian, and Spanish) in Twitter tweets. As features, we suggest using the 200 most frequent terms of the query text (isolated words and punctuation symbols). Applying a simple distance measure and looking at the three nearest neighbors, we can determine the gender (with the nominal values male and female), the age group (with the ordinal measurement 18-24|25-34|35-49|>50), and the Big Five personality traits (extraversion, neuroticism, agreeableness, conscientiousness, and openness on an interval scale containing eleven items). Evaluations are based on four test collections (PAN AUTHOR PROFILING task at CLEF 2015).
منابع مشابه
UniNE at CLEF 2015 Author Identification: Notebook for PAN at CLEF 2015
This paper describes and evaluates an unsupervised authorship verification model called SPATIUM-L1. The suggested strategy can be adapted without any problem to different languages (such as Dutch, English, Greek, and Spanish) with their genre and topic differ significantly. As features, we suggest using the k most frequent terms of the disputed text (isolated words and punctuation symbols with ...
متن کاملSegmenting Target Audiences: Automatic Author Profiling using Tweets: Notebook for PAN at CLEF 2015
This paper describes a methodology proposed for author profiling using natural language processing and machine learning techniques. We used lexical information in the learning process. For those languages without lexicons, we automatically translated them, in order to be able to use this information. Finally, we will discuss how we applied this methodology to the 3rd Author Profiling Task at PA...
متن کاملXRCE Personal Language Analytics Engine for Multilingual Author Profiling: Notebook for PAN at CLEF 2015
This technical notebook describes the methodology used – and results achieved – for the PAN 2015 Author Profiling Challenge by the team from Xerox Research Centre Europe (XRCE). This year, personality traits are introduced alongside age and gender in a corpus of tweets in four languages – English, Spanish, Italian and Dutch. We describe a largely language agnostic methodology for classification...
متن کاملStatistical Learning Methods for Profiling Analysis: Notebook for PAN at CLEF 2015
Author profiling is the task to infer some information about an author by analyzing her/his writing style. It’s application in forensics, business intelligence and psychology makes this topic interesting for researching. In this notebook, we present our baseline approach using SVM and Linear Discriminant Analysis (LDA) classifiers. We analyze features obtained from LIWC dictionaries, these are ...
متن کاملSyntactic N-grams as Features for the Author Profiling Task: Notebook for PAN at CLEF 2015
This paper describes our approach to tackle the Author Profiling task at PAN 2015. Our method relies on syntactic features, such as syntactic based n-grams of various types in order to predict the age, gender and personality traits that has the author of a given text. In this paper, we describe the used features, the employed classification algorithm, and other general ideas concerning the expe...
متن کامل